Automated data entry system

ABSTRACT

The present invention is a novel system and method for providing automated data entry of information. The system may include a processing subsystem capable of processing extensible markup language documents, a predicting subsystem capable of reviewing entered data of an input document and providing a suggestion to the input document, and a database for storing parsed extensible markup language documents. Database may maintain a plurality of relationships among received data of previous documents, and predictor may analyze a plurality of relationships for suggestions. The method for providing automated data entry of information may monitor received data, analyze relationships among received data, and predict a data entry field. A data entry field may be based on received data and relationships among received data. Predicting of data entry fields may incorporate a plurality of predictors whereby one or more values from values provided by each of the plurality of predictors is provided to a user.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is continuing application of and claims priorityto U.S. patent application entitled, A UTOMA TED DATA ENTRY SYSTEM,application Ser. No. 11/214,212, filed on Aug. 29, 2005, which iscurrently co-pending and claims the benefit under 35 U.S.C. §119 of U.S.Provisional Application No. 60/604,964, filed on Aug. 27, 2004 and U.S.Provisional Application No. 60/606,613, filed on Sep. 2, 2004. Said U.S.patent application Ser. No. 11/214,212, U.S. Provisional PatentApplication 60/604,964, and U.S. Provisional Patent Application60/606,613 are herein incorporated by reference in their entirety.

FIELD OF THE INVENTION

The present invention relates generally to the field of data entryprocessing and more particularly to a system and method for providingautomated data entry of information.

BACKGROUND OF THE INVENTION

Data entry processing is widely known method for collecting and storinginformation. Manual data entry is utilized extensively in numerousindustries to gather data and maintain databases. However, manual dataentry is often a tedious, time consuming and labor intensive task.Additionally data entry processing may be quite costly, often requiringcompanies to hire additional employees to accomplish data entry tasks,or outsource data entry tasks to data entry specialists. Further,because data entry is typically performed manually, the process ishighly error-prone.

With the advent of e-commerce, consumers and businesses alike are oftenrequired to provide information to other parties via the Internet,usually by completing a web-based form or set of forms. Software toolshave been developed to reduce data entry workload by automaticallyfilling in empty data fields for certain types of forms. Such softwareprograms have the ability to assist data entry by accessing predefineduser information and filling in blank forms with the appropriate storeddata. However, these programs are generally utilized by Hyper TextMarkup Language (HTML) generated web-based forms, such as those forconsumers making purchases or otherwise providing information on a dataentry form via an Internet web page, and have limited usefulness beyondsuch applications.

Conventional data field population methods are unsuitable for datasetsthat embody extensible markup language (XML) grammars. XML is a standarddata exchange format that allows different communities to define theirown tags and attribute names. Current field population systems may notbe utilized with sophisticated XML grammars, which contain nestedcomposition and complex tree-like structures. These programs generallyonly support data that is relatively unsophisticated and linearlystructured, and often require a perfect match between an incompletedocument and the values and documents already stored to complete theincomplete document. As a result, current programs are not able topredict values for data fields unless there is constant repetition, novariability and a pre-stored perfect match. In contrast, XML data may behighly variable and may not be pre-stored. As a result XML documentsrequire a high volume of manual data entry, which can be very laborintensive, time consuming and error-prone.

Consequently, it would be advantageous if a system and method existed toprovide automated data entry of information supported by complex datastructures.

SUMMARY OF THE INVENTION

Accordingly, the present invention is directed to a novel system andmethod for providing automated data entry of information. In a firstaspect of the present invention, the system may include a processingsubsystem capable of processing extensible markup language documents, apredicting subsystem capable of reviewing entered data of an inputdocument and providing a suggestion to the input document, and adatabase for storing parsed extensible markup language documents.Database may maintain a plurality of relationships among received dataof previous input information, and predicting subsystem may analyze aplurality of relationships to provide one or more suggestions.

In accordance with an additional aspect of the present invention, amethod for automating data entry of information is provided. In anembodiment of the invention, the method may monitor received data,analyze relationships among received data, and predict a data entryfield. Data entry field may be based on received data and relationshipsamong received data. Predicting of data entry fields may incorporate aplurality of predictors whereby one or more values is provided by eachof the plurality of predictors.

It is to be understood that both the foregoing general description andthe following detailed description are exemplary and explanatory onlyand are not restrictive of the invention claimed. The accompanyingdrawings, which are incorporated in and constitute a part of thespecification, illustrate an embodiment of the invention and togetherwith the general description, serve to explain the principles of theinvention.

BRIEF DESCRIPTION OF THE DRAWINGS

The numerous objects and advantages of the present invention may bebetter understood by those skilled in the art by reference to theaccompanying figures in which:

FIG. 1 depicts a data entry system in accordance with an embodiment ofthe present invention;

FIG. 2 depicts a flowchart representing a method for automating dataentry of information in accordance with an embodiment of the presentinvention;

FIG. 3 depicts a flowchart representing a method for processinginformation in accordance with an embodiment of the present invention;

FIG. 4 depicts a database of a data entry system in accordance with anembodiment of the present invention;

FIG. 5 depicts a predicting subsystem of a data entry system inaccordance with an embodiment of the present invention;

FIG. 6 depicts a flowchart of a method for selecting a single data entryfrom a plurality of suggestions in accordance with an embodiment of thepresent invention;

FIG. 7 depicts a voting and aggregation method in accordance with anembodiment of the present invention;

FIG. 8 depicts a weighting method in accordance with an embodiment ofthe present invention; and

FIG. 9 depicts a method of checking a database for errors in accordancewith an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

Reference will now be made in detail to a presently preferred embodimentof the invention, an example of which is illustrated in the accompanyingdrawings.

Referring to FIG. 1, a system 100 for automating data entry ofinformation in accordance with an embodiment of the present invention isshown. System 100 may be comprised of a processing subsystem 110, adatabase 120, and a predicting subsystem 130. In an embodiment of theinvention, processing subsystem 110 may be suitable for processinginformation from received documents 140-160, previously completed andsupported by complex data structures, such as extensible markup language(XML) documents. It is contemplated that a system and method inaccordance with the present invention may be suitable for other markuplanguages such as Hypertext Markup Language (HTML), Standard GeneralizedMarkup Language (SGML) and the like. Database 120 may store parsedextensible markup language documents, and may further maintain aplurality of relationships among received data of previously completeddocuments. Predicting subsystem 130 may review entered data of an inputdocument, such as an input form or a data entry form 170 and provide oneor more suggestions for an uncompleted field of the data entry form 170.Predicting subsystem 130 may analyze a plurality of relationships for aprovided suggestion. For example, the processing system may provide acompleted data entry form 170 for empty data fields. Predictingsubsystem 130 may analyze previous user entered data in the data entryform 170 and historical documents in a database 120 to predict one ormore values for a data field of a data entry form 170. Processingsubsystem 130 may process a collection of documents simultaneously or asingle document individually, such as when a document may be completedby a user. System 100 may provide a user with one or more suggestions,and a user may select a suggestion provided by the system, analternative provided suggestion, or may overwrite the providedsuggestions and enter a non-suggested value manually. The suggestionwhich is used, the first suggestion, an alternative suggestion or auser-entered suggestion may be determined to be the correct suggestion.For example, if a user selects a provided suggestion, suggestion may bestored in database 120, and the value may be analyzed by predictingsubsystem 130 for providing future suggestions. Similarly, if a userenters a non-suggested value manually, processing subsystem 110 mayprocess the value for storage in a database 120, and the value may beanalyzed by predicting subsystem 130 for providing future suggestions.

Referring to FIG. 2, a flowchart representing a method 200 forautomating data entry of information in accordance with an embodiment ofthe present invention is shown. It is contemplated that system 100 ofFIG. 1 may execute method 200 for automating data entry of information.Method 200 may include monitoring received data 210, analyzingrelationships among received data 220, and predicting a data entry field230. Data entry field prediction 230 may be based on received data andrelationships among received data, and may include incorporating aplurality of predictors. For example, a prediction may be determinedfrom one or more statistical, inductive, instance-based, neural network,kernel machine, genetic algorithm, reinforcement learning, analyticallearning, frequency, or recency predictors, or a combination of one ormore types of predictors. It is contemplated that any type of predictionscheme an algorithm may be implemented in accordance with the presentinvention without departing from the scope and intent of the presentinvention. Method 200 may further include providing one or moresuggestions for a data entry field 240 based on one or more data entryfield predictions 230 by employing a voting and weighting scheme inaccordance with the present invention as shown in FIGS. 5-8.

Referring to FIG. 3, a method 300 for processing information inaccordance with an embodiment of the present invention is shown. In anembodiment of the invention, method 300 may be implemented by processingsubsystem 110 of FIG. 1. Method 300 may begin by processing of documentinput information 310. An input document may be parsed 320 by processingsubsystem 110, and the completed document may be stored in a documentdatabase. Node-value pairs and attribute-value pairs of a document maybe retrieved 330. A node-value pair may be defined as the name of a nodeand a value associated with it. A node may represent a data element ofan input document. For example, linearly structured data may contain aset of attributes describing data, events, objects, etc. A node-valuepair may describe the name of data, an event or an object, and thevalues associated with it. An attribute-value pair may be defined as thename of an attribute of a node and the attribute values associated withit. With respect to extensible markup language documents, an attributeis generally an alternate linear method for describing a sub-tree. Forexample, a value of a node “Address” may be a sub-tree containing nodessuch as “Street”, “City,” “State,” etc., each one with its own values.Alternatively, the same tree-like information may be described as alinear attribute: <Address Street=“Main” City=“NY” State=“NY”>. Documentcharacteristics may be computed 340 from node-value pairs andattribute-value pairs. Processed data may be stored in an historicalcollection such as a database 350.

Referring to FIG. 4, a block diagram of a database 400 of a data entrysystem in accordance with an embodiment of the present invention isshown. In a preferred embodiment, database 400 may be a collection ofhistorical documents, and may include a data structure 410 suitable forshort or long-term storage of parsed input data, and any relevantinformation contained therein. Database 400 may collaborate withindividual predictors comprising a predicting subsystem 130. Datastructure 410 of database 400 may contain an entire document collectionin memory, or data structure may be capable of storing a representationof a document collection utilizing memory less than the cumulative sizeof the documents comprising the collection. Data structure 410 mayfurther allow incremental addition of documents without re-indexing acollection, and may allow fast lookup of node values in a document forwhich a second node has a given value via fast look-up tables 440-460.

Data structure 410 may be a multidimensional array of value-occurrenceobjects. Multi-dimensional array may be dimensioned by a plurality ofidentifiers. In a preferred embodiment, multi-dimensional array may bedimensioned by a first identifier and a second identifier. Firstidentifier may be a node identifier 420, and second identifier may be adocument identifier 430. A node may recur in a given document, and arecurring node may assume multiple values. To account for a recurringnode in a document, the data structure array may include avalue-occurrence object list 470 including an element for a distinctvalue assumed by a node in a document. A value-occurrence object mayconsist of a value identifier and an integer count of the recurrence ofa value assigned to a node in a document.

Data structure 410 may further include a plurality of lookup tables440-460. Lookup tables 440-460 may hash strings to integer identifiersand store document names, node names, node values and the like tointeger identifiers. Because node names and node values may often berepeated within a document or across multiple documents in a collection,but may only be stored in the database once, memory usage may bereduced. Additionally, each record in a data structure 410 may store aninstance of a node in a document with a particular value and a referenceto the document in which the node appeared.

Referring to FIG. 5, a block diagram of a predicting subsystem 130 of adata entry system in accordance with an embodiment of the presentinvention is shown. Predicting subsystem 130 may include a plurality ofpredictors 510-530, a weighting system 540 and a suggestion aggregator550. Predicting subsystem 130 may be capable of reviewing the entereddata of an input document and providing a suggestion to an uncompletedfield of the input document. In a preferred embodiment, predictingsubsystem 130 may utilize perfect or approximate equivalency techniquesto predict and propose a value for an empty field. Predicting subsystem130 may further adapt to various domains without customizing algorithms.For example, it is contemplated that an individual predictor may be moresuitable for predicting data in a particular domain. A first predictormay more accurately predict data within a domain such as biology, whileanother may more accurately predict data within a domain such asaccounting. Predicting subsystem 130 may determine which of theplurality of predictors provides the most accurate value for a node in agiven collection, and one predictor may be preferred over another forfuture suggestion prediction based on accuracy and success ofprediction.

Predicting subsystem 130 may make predictions by determining an exact orapproximate equivalent between values in an historical documentcollection and values in a partially completed new document. Predictingsubsystem 130 may employ a plurality of predictive or descriptivealgorithms such as machine learning techniques to predict node values.Specifically, predicting subsystem 130 may include a plurality ofpredictors 510-530 including one or more statistical, inductive,instance-based, neural network, kernel machine, genetic algorithm,reinforcement learning, analytical learning, frequency, recency or likepredictors. It is further contemplated that the number and type ofpredictors included in a predicting subsystem may vary according tosystem need and may include predictors not specifically enumerated.

Plurality of predictors 510-530 may include one or more statisticalpredictors. In an embodiment of the invention, statistical predictor maybe a method of supervised learning. For example, statistical predictormay be a naïve Bayesian classifier. Statistical predictor may utilizepredictive and descriptive techniques to predict node values byanalyzing relationships between independent and dependent variables andderive a conditional probability for each relationship. New values maybe predicted by combining the effects of independent variables ondependent variables. Statistical predictor may then determine aprobability for a node value based on the conditional probabilities ofthe predicted value given the actual values of previously determinednodes. For example, given values of n fields: V₁, V₂, . . . , V_(n), andgiven a possible A for the value of an empty node, a naïve Bayesianclassifier may determine the probability that A is predictably correctby the equation:

${P(A)} = \frac{\prod\limits_{i = 1}^{n}\; {P\left( {AV_{i}} \right)}}{{\prod\limits_{i = 1}^{n}\; {P\left( {AV_{i}} \right)}} + {\prod\limits_{i = 1}^{n}\; \left( {1 - {P\left( {AV_{i}} \right)}} \right)}}$

Statistical predictor may be modified to account for causal linksbetween two or more nodes via a rule inductive learning algorithm. Ruleinductive learning algorithm may be applied to a document database andmay then generate a subset of parameters for database documents. Forexample, to predict a value for a node, X, inductive learning algorithmmay determine that if a value of a node N is V_(n), then the value ofnode X is V_(x), indicating a causal relationship between node N andnode X. Statistical predictor may analyze node values determined byinductive learning algorithm to be causally related. Statisticalpredictor may then provide a suggestion determined from the probabilityanalysis and the learned causal relationship analysis.

Plurality of predictors 510-530 may further include one or moreinductive predictors. In a preferred embodiment, inductive predictor maybe an inductive machine learning algorithm. For example, inductivepredictor may be a C4.5 inductive learning predictor, or a likepredictor capable of predicting node values based on other node values.C4.5 inductive learning predictor may utilize one or more decision treesfor predicting data. Decision trees may be formed from historical datain a database, which may be translated into if-then rules. A pluralityof values may be calculated for a translated rule. For example, a valuefor rule applicability percentage may be calculated, as well as a valuefor error. Applicability percentage values may be calculated from anumber of applicable instances, such as a quantity of documents to whicha rule may apply, and a total number of instances, such as the frequencyof node appearance in a document collection. Error values may becalculated from the number of instances for which a rule may beapplicable but provides an incorrect value.

C4.5 inductive learning predictor decision trees may be pruned based ona percentage of instances where a rule is determined to be applicable.For example, a decision tree may be pruned based on a determination thatthere may be 10% of instances where a rule is applicable. A lowerthreshold of applicability may be established to eliminate rules whoseapplicability may be less than or equal to the threshold percentage.

Plurality of predictors 510-530 may further include one or moreinstance-based predictors. In a preferred embodiment, instance-basedpredictor may be a K-Nearest-Neighbor (KNN) predictor, which identifiesthe k nearest neighbors of a given object. A KNN predictor in accordancewith an aspect of the present invention may include an initial databaseof documents and a distance metric defining a distance betweendocuments. The KNN algorithm may then locate the k document or documentsin a database within the closest proximity to a document to be entered,based on a distance defined by the distance metric. The new document maybe predicted to have the same outcome as the predominant outcome in thek closest document or documents in the training data.

A predicting subsystem 130 of a system in accordance with the presentinvention may further include a weighting system 540 and suggestionaggregator 550. Proposed node values (data entry field suggestions) maybe provided by plurality of predictors 510-530. It is contemplated thateach predictor of one or more predictors 510-530 may make a plurality ofsuggestions. For example, predictors 510-530 may provide zero, or one ormore node values. If multiple suggestions are provided, individualpredictors may determine probable correctness for each suggestion, andrankings may be assigned to suggestions based on the probablecorrectness determination. Predicting subsystem 130 may determine asingle value based upon a voting scheme among the plurality ofpredictors 510-530. A single selected value may represent the valuereceiving the largest number of votes among the plurality of predictors.This may be advantageous as the user may only be supplied with a singleoptimal prediction rather than multiple predictions.

Referring to FIG. 6, a flowchart of a method 600 for selecting a singledata entry from a plurality of suggestions in accordance with anembodiment of the present invention is shown. Method 600 may beimplemented as a voting scheme executed by suggestion aggregator 550 ofFIG. 5. Votes may be counted for a plurality of suggested values 610.Predictor output may be aggregated into a single node value suggestion620. Aggregating may be accomplished by determining the frequency ofoccurrence of a distinct suggested value and selecting the value withthe maximum occurrence. It is contemplated that one or more predictorsmay suggest one or more node values from a plurality of possible nodevalues. If a predictor suggests more than one node value, suggestionsmay be received by an aggregator in a hierarchical order. For example,if a predictor returns N suggestions, a first suggestion may receive avalue of N, a second consecutive suggestion may receive the value ofN−1, and a third consecutive suggestion may receive a value of N−2.Contributions of ranked suggestions may be accounted for byproportionally distributing votes to suggestions based on a maximumnumber of suggestions returned by a predictor. In a preferredembodiment, votes may be allocated to a suggestion by subtracting asuggestion's rank from the maximum number of suggestions, and adding anadditional vote. Predicting subsystem 130 may then total votes receivedby individual suggestions 630, and the value with the largest number ofvotes may be selected as a best overall suggestion. Best overallsuggestion, as well as alternative high ranking suggestions, may beprovided 640 to a user, who may select one of the provided suggestions,or may overwrite the system and enter a value manually. In analternative embodiment of the invention, a single suggestion (the bestoverall suggestion) may be provided. If a user selects a providedsuggestion, suggestion may be stored in database 120 and predictingsubsystem 130 may include suggestion in a future suggestion prediction.Also, if a user enters a non-suggested value manually, processingsubsystem 110 may process the value for storage in a database 120, andthe value may be analyzed by predicting subsystem 130 for providingfuture suggestions.

Referring to FIG. 7, an exemplary voting and aggregation method 700 inaccordance with an embodiment of the present invention is shown. One ormore predictors 710-730 may provide multiple values for an incompletedata entry field and a multiple ranked suggestion voting aggregator 740may select a best overall suggestion. A value for the nodeinsurance_plan::type may have one of four possible values HMO, PPO,Self-Coverage, or None. Predictor A 710 may return three suggestions,with PPO being the highest ranked. Predictor B 720 may return a singlesuggestion, HMO, and Predictor C 730 may return 3 suggestions, also withPPO being the most probable. In this case, the maximum number ofsuggestions returned by any one predictor is 3, so the highest rankedsuggestion for each predictor will be assigned 3−1+1 votes, the secondsuggestion from each predictor will be assigned 3−2+1 votes and theleast suggested will be assigned 3−3+1 votes. The class value PPOreceives three votes from Predictor A and three votes from Predictor C,giving it six in total, the largest number of votes. PPO 750 may beconsidered the best overall suggestion. A user may accept PPO as thecorrect suggestion, may select one of the alternative suggestions, ormay manually enter a non-suggested value.

Voting and aggregation method 700 may form a consensus decision electinga value most suggested by the plurality of predictors 710-730. Votingmay be determined from past accuracy of proposed values for a given nodein a document collection for an individual predictor. An individualpredictor may be assigned a weight corresponding to the predictor's pastaccuracy. When the predictor selects a suggestion, individual predictorweight assignment may be updated to record the accuracy of thesuggestion.

A system in accordance with the present invention may calculate an errorrate for predicted suggestions. Error rate may be determined from thenumber of incorrect suggestions over the total number of suggestions.Error rate may provide performance feedback for a predictor on a node.Performance feedback may be utilized to determine the weight of apredictor's suggestion. Additionally, error rate calculation may beuseful, as one or more individual predictors may provide a nullsuggestion. A null suggestion may occur if a current set of instancesdoes not meet one or more assumptions of a prediction algorithm. Failureto meet an assumption may prevent an algorithm from executing, resultingin a null suggestion. A null suggestion may also be returned from apredictor if its algorithm is terminated during a suggestion process dueto failure to provide a suggestion within specified time parameters.

Referring to FIG. 8, an exemplary weighting method 800 in accordancewith the present invention is shown. One or more predictors 810-830 mayprovide multiple values for an incomplete data entry field and amultiple ranked suggestion voting aggregator 840 may provide suggestionsbased upon values cast by each predictor 810-830 and the weightingapplied to each predictor. In one embodiment of the invention, a singlebest overall suggestion may be provided to the user based upon thevoting and weighting method of the present invention. A weighting systemmay determine and analyze a predictor's previous performance and assigna weight to a suggested value provided by a predictor. A predictor mayperform more or less accurately on different nodes. For example,Predictor A may be a K Nearest Neighbor predictor, and may be assigned aweight of 0.67. Predictor B may be a C4.5 predictor, and may be assigneda weight of 0.21. Predictor C may be a naïve Bayesian predictor, and maybe assigned a weight of 0.45. Accuracy of performance may depend onvarious parameters, including frequency of node appearance in adatabase, and type and statistical distribution of values for a nodeacross a database. Weighting system may determine algorithm performanceon a node and weight votes for the predictor accordingly. A predictorsubsystem may return multiple suggestions ranked by an individualpredictor based on potential correctness. The weight of each internalpredictor may be determined from past performance. For example, aninternal predictor may be assigned greater or less weight based on theirranking of the correct value. In this manner, overall predictionaccuracy of a predictor may be improved.

Error rate calculation may be utilized to determine predictor weight.Predictor weight may be calculated from the following:

${wt}_{i,n} = {{1 - {{errorrate}\mspace{14mu} {or}\mspace{14mu} {wt}_{i,n}}} = {1 - \left( \frac{s_{i,{n\; {\ldots {incorrect}}}}}{s_{i,{n\ldots total}}} \right)}}$

(Where i is a predictor, s_(i,n..incorrect) is the number of incorrectsuggestions and s_(i,n..total) is a predictors total number ofsuggestions.) The total number of suggestions per node given by anindividual predictor and the total number of errors in suggestions forthe node by each predictor may be recorded. Predictors may be initiallyassigned equivalent weights. Predictor weight may be updated after asuggestion is made. For example, if a suggestion made by a predictor fora given node is correct, the total for suggestions of the predictor forthat node may be incremented. However, if a suggestion made by apredictor for a given node is incorrect, the total for incorrectsuggestions by the predictor for that node may be incremented. Forexample, a predictor may provide three suggestions ranked as [A, B, C].If the correct value is A, the predictor may be assigned a weight of 1.If the correct value is B, the predictor may be assigned a weight of0.66. If the correct value is C the predictor may be assigned a weightof 0.33. If the correct value is a value other than A, B or C, apredictor may not be assigned a weight. Weight assignment may be definedin a variety of ways, and may not be limited to equal distribution ofweight to suggestions or parameters such as rank, number of providedsuggestions or the like.

Prior to summing votes, a vote may be multiplied by the normalizedweight for the node and predictor. Predictor weight may be normalized toprevent favoring of predictors who return non-null suggestions moreoften. Normalization may be accomplished by the following:

${{Normalized}\mspace{14mu} {wt}_{i,n}} = \frac{{wt}_{i,n}}{\sum\limits_{i = 1}^{c}{wt}_{i,n}}$

(where wt_(i,n) is the weight assigned to the i^(th) predictor whensuggesting a value for node n and c is the total number of predictors).

A system in accordance with an embodiment of the present invention maybe utilized for classification prediction for samples of data setscomprised of linearly structured data. Linearly structured data setstypically consist of a set of attributes describing data, events,objects or the like. Predictions may be determined from values of datastored in the same sort of data base format or data entry forms withlinear structures. For example, a data set for eyewear may be describedby a set of attributes including age of patient, spectacle prescription,astigmatic, tear production rate, and the like. Classifications mayarise for a sample of this data set, such as fit patient with hardcontact lenses, fit patient with soft contact lenses, do not fit patientwith contact lenses. Based on the attribute values, predictor may thenpredict the appropriate classification for a sample and providesuggestions. A user may then accept a suggestion, or overwrite thesuggestion and manually enter a desired classification. If a userselects a provided suggestion, suggestion may be stored in database 120,and the value may be analyzed by predicting subsystem 130 for providingfuture suggestions. If a user manually provides a classification value,the value may be stored by the system in the database, and the predictormay include manually provided suggestion in a future classificationprediction.

In an embodiment of the invention, data entry system 100 of FIG. 1 maybe employed to provide database error detection. An input database maycontain numerous errors due to inaccurate entry or data corruption. Dataentry system 100 may be employed to detect errors in the input databaseand provide one or more suggestions for fields where errors aredetected. Processing subsystem 110 may be capable of receiving completeddatabases. Database 120 may store parsed extensible markup languagedocuments, and may further maintain a plurality of relationships amongreceived data of previously received databases. Predicting subsystem 130may review an input database and check entries of the input database.Predicting subsystem 130 may analyze a plurality of relationships storedin database 120 to predict an entry for the input database. If thepredicted entry is a match with the completed entry, the database entrymay be judged as a correct entry. If the predicted entry is not a matchwith the entered information, an error alert may be provided.

When an error is detected, predicting subsystem 130 may provide an errormessage which may be a visual indication, for example, underlining ofpotential errors or a list of potential errors may be presented to theuser similar to conventional word processing error detection systems. Itis contemplated that any detected errors may be supplemented with one ormore suggestions as discussed for an uncompleted field whereby a usermay easily correct the error by accepting the suggestion or manuallyoverwrite a suggestion for another value. Any fields of input databasethat are not completed may also be judged as an error and predictingsubsystem 130 may provide one or more suggestions to complete the emptyfield.

Referring to FIG. 9, a method 900 of checking a database for errors inaccordance with an embodiment of the present invention is shown. Method900 may begin by monitoring received data from previously completeddatabases 910. Relationships among data of received databases may bedetermined 920. An entry of said input database may be predicted basedupon received data and relationships among received data 930. It iscontemplated prediction of entries may be implemented as previouslydescribed in FIGS. 1-8. Detection of an error may be determined 940 bycomparing said predicted entry and an entered entry of said inputdatabase. If the predicted entry matches the entered entry, the enteredentry may be judged as a correct entry. If the predicted entry does notmatch the entered entry, then the entered entry may be judged as anincorrect entry, or an error. An alert may be provided when an error isdetected 950. It is further contemplated that one or more predictedentries may be provided to the user for correction of any detectederrors.

It is believed that the present invention and many of its attendantadvantages will be understood by the foregoing description, and it willbe apparent that various changes may be made in the form, construction,and arrangement of the components thereof without departing from thescope and spirit of the invention or without sacrificing all of itsmaterial advantages. The form herein before described being merely anexplanatory embodiment thereof, it is the intention of the followingclaims to encompass and include such changes.

1. A system for automatic data entry of information, comprising: aprocessing subsystem, said processing subsystem being configured toreceive documents and parse data entered in said documents; a database,said database being configured to store parsed data, wherein a pluralityof relationships developed from said parsed data is maintained; and apredicting subsystem, said predicting subsystem being configured toreview entered data of an input document and said plurality ofrelationships from received documents and provide a suggestion for anuncompleted field of said input document.
 2. The system as claimed inclaim 1, wherein said documents are extensible markup languagedocuments.
 3. The system as claimed in claim 1, wherein said processingsubsystem parses data of a completed document and stores the parsed dataof said completed document in said database.
 4. The system as claimed inclaim 1, wherein said plurality of relationships developed from saidparsed data includes node-value pairs and attribute-value pairs.
 5. Thesystem as claimed in claim 1, wherein said predicting subsystem includesa plurality of individual predictors.
 6. The system as claimed in claim5, wherein said plurality of individual predictors include at least oneof a statistical predictor, an inductive predictor or an instance-basedpredictor.
 7. The system as claimed in claim 6, wherein said statisticalpredictor is a naïve Bayesian classifier.
 8. The system as claimed inclaim 6, wherein said inductive predictor is a C4.5 inductive learningpredictor.
 9. The system as claimed in claim 6, wherein saidinstance-based predictor is a K-Nearest-Neighbor predictor.
 10. Thesystem as claimed in claim 5, wherein said predicting subsystem includesa suggestion aggregator.
 11. The system as claimed in claim 10, whereinsaid suggestion aggregator provides a single suggestion based upon avoting scheme among the plurality of predictors.
 12. The system asclaimed in claim 11, wherein said predictor includes a weighting system.13. The system as claimed in claim 12, wherein said weighting systemdetermines performance of each individual predictor of said plurality ofindividual predictors and weights votes of each individual predictorbased upon past performance.
 14. The system as claimed in claim 13,wherein said weighting system assigns a weight to each individualpredictor of a plurality of individual predictors based on a percentageof correctly provided suggestions of a total number of suggestions. 15.The system as claimed in claim 14, wherein said weighting system adaptsto multiple data domain types through adjustment of weighting for eachindividual predictor of said plurality of individual predictors based onpast performance.
 16. The system as claimed in claim 13, wherein saidsuggestion aggregator provides a single suggestion based upon a votingscheme among the plurality of predictors in combination with weightingof each vote from each individual predictor.
 17. A data entry system fordetecting errors in an input database comprising: a processing subsystemreceiving one or more completed databases, the one or more completeddatabases storing parsed extensible markup language documents, andmaintaining a plurality of relationships among received data of one ormore previously received databases; and a predicting subsystem reviewingan input database, checking entries of the input database, analyzing aplurality of relationships stored in the one or more databases topredict an entry for an input database, and providing one or moresuggestions for a field where an error is detected.
 18. The system asclaimed in claim 17, wherein said processing subsystem parses data of acompleted document and transmits the parsed data of said completeddocument to said completed database.
 19. The system as claimed in claim17, wherein said plurality of relationships among received data includesnode-value pairs and attribute-value pairs.
 20. A method for checkingerrors in a database, comprising: monitoring received data fromcompleted databases; determining relationships among received data;predicting an entry of said input database, wherein said entry of saidinput database is predicted based upon received data and relationshipsamong received data; detecting an error by comparing said predictedentry and an entered entry of said input database and, if the predictedentry matches the entered entry, determining the entered entry to be acorrect entry or, if the predicted entry does not match the enteredentry, determining the entered entry to be an error; providing an alertwhen said error is detected; and. providing the predicted entry to saiduser for correction of said error.